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Abstract —The problem of quickest detection of a change in the 
distribution of a n x p random matrix based on a sequence of 
observations having a single unknown change point is considered. 
The forms of the pre- and post-change distributions of the 
rows of the matrices are assumed to belong to the family of 
elliptically contoured densities with sparse dispersion matrices 
but are otherwise unknown. A non-parametric stopping rule 
is proposed that is based on a novel scalar summary statistic 
related to the maximal k-nearest neighbor correlation between 
columns of each observed random matrix, and is related to a 
test of existence of a vertex in a sample correlation graph having 
degree at least k. Performance bounds on the delay and false 
alarm performance of the proposed stopping rule are obtained. 
When the pre-change dispersion matrix is diagonal it is shown 
that, among all functions of the proposed summary statistic, 
the proposed stopping rule is asymptotically optimal under a 
minimax quickest change detection (QCD) model, in the purely 
high-dimensional regime of p — > oo and n fixed. The significance 
is that the pnrely high dimensional asymptotic regime considered 
here is asymptotic in p and not n making it especially well suited 
to big data regimes. The theory developed also applies to fixed 
sample size tests. 

Index Terms —Big data, correlation change detection, correla¬ 
tion screening, correlation mining, generalized likelihood ratio 
test, misspecification of distribution, quickest change detection, 
summary statistic. 

I. Introduction 

One of the greatest challenges in data analysis is to develop 
robust algorithms for statistical inference on large scale data. 
Many big data applications fall in the so-called sample starved 
regime ||T|, where conclusions have to be drawn or decisions 
have to be made based on a small set of samples of a high¬ 
dimensional vector. Most classical statistical tests have been 
designed for the large sample regime, where the number of 
samples are much larger than the dimension of the vector, and 
hence are not applicable to high-dimensional data analysis. 
Thus, new approaches are needed to address these challenges. 

In this paper we consider the problem of detecting a change 
in maximal coherence between variables in high dimension 
with a limited number of samples. Specifically, assume that a 
sequence of high-dimensional vectors is available. Initially the 
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vector sequence is i.i.d. with a nominal correlation matrix, i.e., 
a “normal” or “expected” baseline of multivariate correlations. 
At some time point in the sequence the correlation matrix 
may change, e.g., due to a certain unknown event, activity or 
disorder. The objective is to detect this change in correlation 
as quickly as possible. In other words, we want to quickly 
determine whether or not the sequence of high-dimensional 
vectors is inhomogeneous, i.e., if there is a “change point” in 
the sequence beyond which the correlation structure changes. 
In many applications the change has to be detected in real 
time, i.e., with minimum possible delay while avoiding false 
alarms. Rapid and timely detection of disorder can potentially 
save the cost of acquiring the rest of the samples. 

This problem has applications to slippage problems in 
multivariate time-series analysis and financial stock analysis, 
anomaly detection in social networks and communication 
networks, and intrusion detection in sensor networks. In 
multivariate time-series analysis, it is of interest to know 
if the coefficients of the time series have abruptly changed 
over time. In stochastic finance, it is of interest to detect a 
sudden change in the correlation between a set of stocks being 
monitored. In social networks, it is of interest to detect an 
abrupt change in the interaction level between a pair of agents. 
In communication networks it is of interest to detect emergent 
hubs of highly correlated traffic flows over the network. Such 
a hub may be a potential point of attack by a cyber attacker. 
In sensor network intrusion detection, the presence of an 
intruder might affect the correlation between the observations 
at various sensors located near the intruder. 

The major challenges in this problem are; 

1) The dimension of the vectors is much larger than the 
total number of available samples before and after the 
change point. 

2) The statistical properties of the vectors may not be 
precisely known, i.e., the problem is nonparametric in 
nature. 

We formulate this detection problem in the framework of 
quickest change detection (QCD) (see, e.g., 0, 0, and 0). 
In the QCD problem a decision maker observes a stochastic 
process over time. At some point in time, called the change 
point, the distribution of the process changes. The decision 
maker has to detect this change in distribution with minimum 
possible delay, subject to a constraint on false alarms. The 
QCD problem has been formulated in various ways in the 
literature. One prevalent formulation of the QCD problem is 
as a stochastic optimization problem, where the goal is to 
find a stopping time on the observed stochastic process so 
as to minimize a suitable metric on the delay, subject to a 
suitable metric on the false alarm rate. A typical solution 
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is a stopping rule that reduces to a single threshold test, 
where a sequence of statistics is computed over time, and a 
change is declared the first time the statistic exceeds a stopping 
threshold. The stopping threshold is chosen to control the rate 
of false alarms. The theoretical foundation for such sequential 
decision making was laid by Wald; see 0 0. The Bayesian 
version of the problem, where a prior on the change point is 
assumed, is developed in 0, 0, and 0. The QCD problem 
in non-Bayesian minimax settings has been formulated in pO) , 
|n), fig, (g, lig. In general, an optimal or asymptotically 
optimal solution to a QCD problem can be obtained only 
when the pre- and post-change distributions are known to the 
decision maker, or when the post-change distribution is in a 
parametric family. In the nonparametric setting, the problem 
in general is not tractable, that is, an optimal solution cannot 
be obtained. As a result, in the nonparametric setting the 
goal is often less ambitious than to find an optimal solution. 
Rather, a reasonable procedure is proposed and its properties 
are established, e.g., consistency, convergence rate, scalability, 
and so on. In this paper we propose a consistent and scalable 
nonparametric procedure for correlation change detection. 

Consider the following random matrix observation model. A 
sequence of random matrices {X(to)} is observed over time, 
indexed by m, where each X(m) is a n x p short and fat 
random matrix. By short and fat matrix we mean p ^ n. 
The rows of these random matrices may correspond to ap¬ 
proximately independent realizations of p different variables, 
e.g., sampled over blocks of time or sampled in a sequence of 
repeated experiments. For example, in the case of detecting 
a change in the coefficients of a Gaussian univariate time 
series, p successive time samples may be acquired over n 
well separated blocks of time. A change in the coefficients 
of the time series is reflected in a change in the correlation 
matrix associated with each block. In stochastic finance, we 
may have access to multiple instances of stock values over a 
day or week, and a change in correlation may occur only at the 
end of the day or week. For example, we may have a total of 
500 samples of a 10, 000 dimensional vector of stock closing 
prices. These 500 samples are acquired periodically over time, 
and an anomaly may occur at any time. These 500 samples 
may be acquired 5 samples at a time. Thus, p = 10,000 and 
n = 5, and we have a sequence of 100 samples of 5 x 10,000 
random matrices. 

If the distribution of the random matrices belong to a para¬ 
metric family, and the parameter before the change is known, 
then efficient procedures from the quickest change detection 
literature can be used for detection d), 03’ GD- However, 
as discussed above, in the absence of a parametric model 
for the data, or even a lack of knowledge of the pre-change 
parameter in a parametric model, a situation common in big 
data settings, no optimal procedures are known for detection 
of change. Also, because of the high-dimensional nature of 
the problem (p ^ n), it is difficult to design nonparametric 
procedures. For example, in this setting the sample covariance 
matrix is rank deficient and can be a very poor estimate of the 
population covariance matrix of the vectors. In this paper we 
propose a nonparametric procedure that can provably detect 
a class of changes in population correlation in the observed 


high-dimensional vectors. 

Specifically, in Section [I^ we consider the problem of 
quickest detection of a change in population dispersion (or 
correlation) matrix under the assumption that the rows of each 
matrix X(to) are independent and identically distributed, with 
joint distribution from the nonparametric family of elliptically 
contoured distributions. The precise mathematical problem is 
stated in Section We propose a novel scalar summary 
statistic F(X) for the data matrix X that is used as the 
test statistic in the change detection procedure. The summary 
statistic is the minimal size of the /c-nearest neighborhood 
among all the columns of the observed matrix, where size 
is measured by the sample correlation associated with the 
column and its /c-nearest (most correlated) neighbors. We 
obtain an approximate distribution for the summary statistic 
in the sample starved purely high dimensional regime of 
p —i' oo with n fixed and small. We show in Section m that 
the distribution of the summary statistic belongs to a one- 
parameter exponential family, with the unknown parameter a 
function of the underlying distribution of the data matrix. 

In this manner we map the sequence of n x p observed 
data matrices {X(m)} to a sequence of real valued sum¬ 
mary statistics {y(X(m))} whose distribution is in a known 
parametric family for sufficiently large p and finite n. A 
change in distribution in the sequence {X(to)} may cause a 
change in the parameter of the distribution of {V (X(m))}. We 
propose to detect this latter change by applying the generalized 
likelihood ratio (GLR) test of Lorden 03- However, the 
change can be detected accurately only if the pre-change 
parameter is known. Towards this end, we derive sufficient 
conditions on the population dispersion matrix before change, 
that guarantee that the Lorden’s test is either optimal or is 
consistent. Specifically, we provide a detailed performance 
analysis of the Lorden’s test under misspecification of the pre- 
and post-change distributions. We make these notions precise 
in Section|IV] In Section|V]we validate the effectiveness of the 
proposed procedure by verifying the theoretical results through 
numerical simulations. 

To summarize the contributions of this paper; we propose 
a nonparametric quickest detection procedure for detecting a 
change in correlation in a sequence of n x p high-dimensional 
matrices (p ^ n) {X(m)}. The details are summarized below. 

1) We propose a novel summary statistic 1^(X) of the 
data matrix X, which is the maximal sample correlation 
between the columns of the matrix X. 

2) We obtain an asymptotic distribution for T^(X) in the 
purely high dimensional regime of fixed n and p —^ oo 
under the assumption that the rows of X are elliptically 
distributed. The asymptotic distribution of V (X) belongs 
to a one-parameter exponential family. 

3) The change is detected by applying Lorden’s GLR test 
to the summary statistic sequence {F(X(m))}. 

4) We analyze the performance of Lorden’s GLR test when 
the pre- and post-change distribution are misspecified. 

5) We obtain conditions on the pre- and post-change disper¬ 
sion matrices in the elliptical model of X(m) for which 
the change can be accurately detected. 
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II. Problem Description 

A decision-maker sequentially acquires samples from a fam¬ 
ily of distributions of nxp random matrices over time, indexed 
by TO, leading to the random matrix sequence {X(TO)}m>i, 
called data matrices. For each to the data matrix X(to) has 
the following properties. Each of its n rows is an independent 
identically distributed (i.i.d.) sample of a p-variate random 
vector X(to) = [Xi{m), ■ ■ ■ ,Xp{m)]'^ with p x 1 translation 
parameter and p x p positive definite dispersion matrix 
Sm- The random vector X(to) has an elliptically contoured 
density, also called an elliptical density 

/X(m)(x) = /lm((x - 

for some unknown nonnegative strictly decreasing shaping 
function hm on IR+. If = 0 and = Ip, where Ip 
is the p X p identity matrix, then the random vector X(to) is 
said to have a spherical density. 

The data matrices {X(to)} are assumed to be statistically 
independent. For some time parameter 7 the samples have 
common dispersion matrix = Xg and shaping function 
hm — hg for TO < 7 and common dispersion parameter 
Sm = Xi ^ So and shaping function hm = hi for to > 7 . 
The scalar 7 is called the change point and the pre-change 
and post-change distributions of X(to) are denoted by and 
/x, respectively. No assumptions are made about the mean 
parameter fj,^, which can take different values for different to. 
More specifically, as the rows of X(to) are i.i.d. realizations of 
the elliptically distributed random variable X(to), this change- 
point model is described by: 

for TO < 7 

n 

Mm) ~ /x(X) = n /x(m)(Xi) 

i-1 

Mm) ~ /x(m)(x) = ho{{x - /^„)^So-^(x - /x„)), 
for m > "f 

71 

Mm) ~ /x(X) = /i(„)(xi) 

i-1 

Mm) ~ fk(m)(M = hliix - /^„)^Si-^(x - fi^)). 

( 1 ) 

At each time point to the decision-maker decides to either 
stop sampling, declaring that the change has occurred, i.e., 
TO > 7 , or to continue sampling. The decision to stop at time 
TO is only a function of (X(l),--- ,X(to)). Thus, the time 
at which the decision-maker decides to stop sampling is a 
stopping time for the matrix sequence {X(to)}. The decision¬ 
maker’s objective is to detect this change in distribution of the 
data matrices as quickly as possible, subject to a constraint on 
the false alarm rate. 

The above detection problem is an example of the quickest 
change detection (QCD) problem. See 0 , 0 , and 0 for an 
overview of the QCD literature. The objective in our QCD 
problem is to find a stopping time r on the sequence of data 
matrices {X(to)}, so as to minimize a suitable metric on the 
delay (r— 7 ), subject to a constraint on a suitable metric on the 


event of false alarm {r < 7 }. This paper follows the minimax 
QCD formulation of Poliak fTT) : 

min sup E.y[T — 7 |t> 7 ] 

^ 7>i ( 2 ) 

subj. to Eoo[t]>/3, 

where E-y is the expectation with respect to the probability 
measure under which the change occurs at 7 , Eoo is the 
corresponding expectation when the change never occurs, and 
/3 > 1 is a user-specified constraint on the mean time to false 
alarm. 

If the pre- and post-change densities and are known 
to the decision maker, and is constant before and after 
change, then algorithms like the Cumulative Sum (CuSum) 
algorithm pO), pT| , j 121, or the Shiryaev-Roberts family of 
algorithms p8[ , ]17| , 119 , can be used for efficient change de¬ 
tection. Both the CuSum algorithm and the Shiryaev-Roberts 
family of algorithms have strong optimality properties with 
respect to both the popular formulations of Lorden GD and 
that of Poliak GZ)’ used in this paper. 

If only the pre-change and post-change shape functions hg 
and hi are known then 0 is a parametric QCD problem. In 
this case, under the assumption that = /Xq, to < 7 , and Sg 
are known, efficient QCD algorithms can be designed, having 
strong asymptotic optimality properties, based on, e.g., the 
generalized likelihood ratio (GLR) technique 0, the mixture 
based technique 0 , or the nonanticipating estimation based 
technique Gg. 

In many situations, however, the pre- and post-change shape 
functions /ig and hi may be unknown. This is the non- 
parametric QCD setting considered in this paper. In addition 
we are interested in the scenario where p ^ tx. There is 
no known non-parametric and efficienj^ solution to the QCD 
problem in this high dimensional regime. For example, for a 
test based on empirical estimates of the covariance/correlation 
matrix of X, one typically needs the dimension p to be 
smaller than the number of samples n. In the rest of the 
paper we propose a technique for efficient quickest detection 
of changes for the above nonparametric and high-dimensional 
QCD problem. 

We first propose a novel scalar summary statistic V (X) on 
the data matrix X. The theory from | [2T] helps us establish that 
the proposed summary statistic has a well defined exponential 
limiting distribution as p —>■ cx) for fixed n, the so-called 
’’purely high dimensional regime” 0. This summary statistic 
is related to the empirical distribution of the vertex degree of 
the correlation graph associated with the thresholded sample 
correlation matrix. Below we show that the distribution of the 
statistic F(X) converges to a parametric distribution in the 
exponential family in this purely high dimensional regime. 
Thus, the nonparametric QCD problem in terms of {X(to)} is 
mapped to a parametric QCD problem in terms of the summary 
statistic sequence {F(X(to))}. We then apply a GLR based 
test suggested by Lorden in pT[ to the sequence of summary 
statistics {F(X(to))} to detect the change efficiently. 

* We call a test efficient if there is a linear relationship between the average 
delay and the logarithm of the mean time to false alarm. This is a standard 
notion of a good test in the literature; see |20| and also see Theorem |4. 1 1 
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If the pre-change dispersion matrix Sg is diagonal, then 
we show below that this amounts to a known pre-change 
parameter in the QCD problem of detecting a change in the 
distribution of the statistic V{X{m)). In this case the GLR 
stopping rule used is asymptotically optimal under the Lorden 
minimax QCD model |TT|, and hence also in terms of solving 
among all rules that are stopping rules for the sequence 
{C(X(m))}. 

If the pre-change matrix Sg is unknown and not diagonal, 
then we have an unknown pre-change parameter in the QCD 
problem based on {l^(X(m))}. Below we establish conditions 
on the matrix Sg which guarantee that the GLR stopping rule 
remains approximately optimal. This is achieved by analyzing 
the performance of the GLR test under mis-specification of 
the pre-change distribution. 

III. Summary Statistic for the Data Matrix X 

In this section we define the proposed summary statistic 
C(X) and obtain its asymptotic density in the purely high 
dimensional regime of p ^ oo, n fixed. 

The notation below follows the conventions of ID- For an 
elliptically distributed random data matrix X we write 

X=[Xi,... ,X,] = [Xf,),... 

where X.^ = [Xu,- - ■ ,Xni]’^ is the column and X(j) = 
\Xii,--- ,Xip\ is the row. Define the p x p sample 
covariance matrix as 

1 " 

i=l 

where X is the sample mean of the n rows of X. Also dehne 
the sample correlation matrix as 

R = Ds■^/^SDs■^/^ 

where Da denotes the matrix obtained by zeroing out all but 
the diagonal elements of the matrix A. Note that, under our 
assumption that the ensemble dispersion matrix £ of the rows 
of X is positive definite, Ds is invertible with probability one. 
Thus Rij, the zjth element of the matrix R, is the sample 
correlation coefficient between the and columns of X. 

Dehne (z) to be the sample correlation between the z-th 
column of X and its fc-th nearest neighbor in the columns of 
X (in terms of Euclidean distance): 

d^^(z) := largest order statistic of {|Rij|;j ^ i}. 
Then for hxed fc, dehne the summary statistic 

14(X) := max dNN(*)- (3) 

l<i<p 

This summary statistic is the maximal coherence of the kNN 
neighborhoods, where coherence of a kNN neighborhood 
is dehned as the minimal magnitude correlation over that 
neighborhood. Below we show that the distribution of the 
statistic 14 can be related to the distribution of an integer 
valued random variable Ns^p that counts the number of highly 
correlated neighborhoods. 

We note that as n —)■ oo the collection {dNN(0} for 1 4 
* < P, 1 < A; < p, specihes the entire population correlation 


matrix as the sample correlations between 

all A:-nearest neighbors of all orders k completely specifies 
the sample correlation. However, the summary statistic 14 
is a global statistic, and is insensitive to variations in the 
minimal kNN correlations as long as the maximum of these 
kNN correlations remains the same. 

For a threshold parameter p G [0,1] dehne the correlation 
graph fzp(R) associated with the correlation matrix R as an 
undirected graph with p vertices, each representing a column 
of the data matrix X. An edge is present between vertices z 
and j if the magnitude of the sample correlation coefficient 
between the z*^ and components of the random vector X 
is greater than p, i.e., if |Ry | > p, i j- We dehne 6i to 
be the degree of vertex z in the graph ^p(R). For a positive 
integer <5 < p — 1 we say that a vertex z in the graph Izp(R) 
is a hub of degree 5 if > S. We denote by A4 p the total 
number of hubs in the correlation graph tlp(R), i.e., 

Ns^p = card{z : 6i > i5}. 

The events {14 (X) > p} and {A4 p > 0} are equivalent. 
Hence 

P(14(X) >p) = P{Ns,p > 0). (4) 

Because of the above relation, for a hxed level p, 14 (X) 
indicates the presence of star subgraphs of degree at least k 
in the correlation network of threshold value p. Thus 14 (X) is 
an extreme value statistic that is only sensitive to the topology 
of the correlation network through the distribution of star 
subgraphs. 

As in ID and pT) we say that a matrix is row sparse of 
degree k if there are no more than k nonzero entries in any 
row. We say that a matrix is block sparse of degree k if the 
matrix can be reduced to block diagonal form having a single 
k X k block, via row-column permutations. 

Theorem 3.1: Let S, the population dispersion matrix of the 
rows of X, be row sparse of degree k = o{p). Also let p —)■ oo 
andp = pp —>■ 1 such that p^/'^(p—l)(l—p^)(”“^^/^ —>■ € 

(0, (X)). Then: 

1 ) 

P(14(X) >p)^l- exp(-AJx/(^(^)), 

where Jx is the function (dehned as function J in ID 

Equation (33)]) of the distribution of X (and hence of 

X) and 

A= lim A(p) = ((e„, 5 a„)/(n - 2))^/i5!, 

p—>-oo,p—>-1 

with 

Kp) =p{^ ^ 

4 

Po{p)=an {l-u^)^du, 

J p 

ttn = 2B((n—2) j2, 1/2) with B{l,m) the beta function, 

4>{S) = 2 if 6 — 1, (j){S) = 1 otherwise. 

2) If S is block sparse of degree k, then 

Jx = l + 0 ((fc/p)'+'), 
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and if S is diagonal then Jx = 1- 
Proof: The result follows from (HI) and from Proposition 
2 in HIT): 

P{Ns,p > 0) -)> 1 - exp(-AJx/(/)((5)), 

under the same asymptotic limit of p and p specified in the 
theorem statement. ■ 


In Section III-A below we provide some insights into the 


nature of the parameter Jx appearing in Theorem 3.1 above. 
We first comment on the consequences of this theorem. 

Using 0 and Theorem HU the large p distribution of 14 
defined in ([^ can be approximated, for k = S, hy 

P(K5(X) < p) = exp(-A(p) Jx/0(<5)), p € [0,1], (5) 


where A(p) is as defined in Theorem 3.1 Although the limits 


in Theorem |3.1| are guaranteed to hold for large values of p, 
numerical experiments 0 have shown that the approximation 
0 remains accurate for smaller values of p as long as n is 
small and p ^ n. 

The distribution 0 is differentiable everywhere except at 
p = 0 since P(14(X) = 0) > 0 when using the finite p and 
p < 1 approximation Ap for A specified in Theorem 3.1 For 
p > 0 and large p. Vs has density 

fv{p) = - 0 (^j)^ ^xexp , p g ( 0 , 1 ]. ( 6 ) 

Note that fy in 0 is the density of the Lebesgue continuous 
component of the distribution 0 and that it integrates to 1 — 
0{e~P^) over p e (0,1]. 

The density fv is a member of a one-parameter exponential 
family with Jx as the unknown parameter. This follows from 
the following relations. First 

= cr(p)^ 


A(p) =p 


where 


C = Cp^n,S = P 


p-1 

S 


( 8 ) 


does not depend on p, and 


T(p)= / (1-u^)- 

Jp 


du. 


(9) 


Using 0 and noting that T(p)' = — (1 — p^)^, fv{p) = 
fv{pi Jx) is a member of the exponential family with param¬ 
eter Jx > 0; 

fv{p', Jx) 

The vertex degree parameter <5 in ( [T0| ) is a fixed design 
parameter that can be selected to maximize change detection 
performance according to 0. In the sequel, we fix (5 = 1. 
For this value of S, the statistic Vs reduces to the maximal 
magnitude correlation 


V{X) = max|Rij|, 


( 11 ) 


and the density in ([T0|) reduces to 


Mp\J) 


c 




ti-4 

2 


J exp 



JT(p) 


pe (0,1], 


( 12 ) 

where we have suppressed subscript X in the exponential 
family parameter J on the distribution of X. 

In Fig. [T] is plotted the density fy for various values of J 
for n = 10, and p = 100. We note that for the chosen values 
of n and p, the density is concentrated close to p = 1. 



p- 


Fig. 1. Plot of density fy in {0 for various values of the parameter J for 
n = 10, p = 100. This is the density of the summary statistic used to detect 
a change in covariance of the random matrix sequence X. 


A. Interpretation of the parameter J = Jx 

The asymptotic approximation to the probability P{Ns ,P > 
0), used in Theorem 3.1 is obtained in |211 by relating Ns^p 
to a Poisson random variable in the purely high dimensional 
limit as p —oo and n fixed. The first step in the process is a 
Z-score representation of sample correlation R; 

R = Z^Z, 


Z,; = 


Z= [Zi, 

Xi pil 


z = l,...,p 


sfd^is/n - l’ 

These Z-scores lie in a n — 2 dimensional subspace 


1^ Z, = 0 and llZdi = 1. 


Due to the fact that 

Z^Zj = Rij, and \\Zi — Zj\\ = \J2(1 — PLij), 

the correlation between the columns of the data matrix is 
directly related to the Eucledian distance between their cor¬ 
responding Z-scores. The parameter J = Jx is a limiting 
value of an average of the joint density of the Z-scores. It is 
a Z-score uniformity measure: J = 1 implies the scores are 
uniformly distributed on the n — 2 dimensional sphere, J > 1 
if the scores are homophilic in nature, and J < 1 if they are 
homophobic. For more details we refer the readers to 0- 


IV. QCD FOR LARGE SCALE RANDOM MATRICES 
In this section we apply the asymptotic results of Theo¬ 


rem 


3.1 derived in Section hffl to quickest change detection 


of the maximal kNN coherence in the data matrix sequence 
{X(m)}. We assume that both the pre- and post-change 
dispersion matrices, Sq and Si, are row sparse with degree 
k = o{p), and map the data matrix sequence {X(m)} to the 
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sequence of summary statistics {Vi5(X(m))}m>i, with (5 = 1. 
For simplicity we refer to this sequence by {V{m)}. In the 
asymptotic regime considered in Theorem EH the random 
variables {V{m)} each have an approximate asymptotic den¬ 
sity fj of form ( [T2I 1. A change in the distribution of the data 
matrix sequence {X(m)} at time 7 changes the parameter J 
in the density /,/ of variable V. Let Jq and Ji be the value of 
parameter J before and after change point 7, respectively. The 
QCD problem on the density /x, depicted in Q, is reduced 
to the following QCD problem on the density /y: 


V{m) ~ /y(-; Jo), m < 7 
~ TO > 7. 


(13) 


Below we consider two cases of row-sparse pre-change dis¬ 
persion matrices: 

1) So is diagonal, and 

2) So is not diagonal but is block-sparse. 

Note that Si is only assumed to be row-sparse. 

If So is diagonal then, from Theorem ED Jo = 1, and the 
QCD problem in ( [T3] l reduces to a QCD problem of change 
in parameter of an exponential family, with known pre-change 
parameter value. The change in this case can be efficiently 
detected using Lorden’s GLR test E) (also see Section [IV-A| 
below), and the test can be designed using the performance 
analysis provided in HI)- 

In the case of nondiagonal dispersion matrix Sq, the QCD 
problem in ( [l3] l has an unknown pre-change parameter Jq. 
There are no known efficient solutions to the QCD problem in 
this case. However, we recall that if the dispersion matrix So 
is only block-sparse with degree k p, then by assertion 2 of 
Theorem 3.1 Jq is close to 1, i.e., | Jo — 1| is small. Motivated 


by this fact we use Lorden’s test in this case as well, with 
Jq set equal to 1, and characterize the range of pre-change 
parameters close to 1 for which the change can be detected 


efficiently. Specihcally, in Section IV-B below, we provide 
delay and false alarm analysis of Lorden’s test when there 
is a mis-specihcation in the distribution. The performance 
analysis is provided for an arbitrary one-parameter exponential 
family, and not just for the family in ( [T^ . To the best of our 
knowledge, this is the hrst such analysis to be reported in the 
literature. 


A. QCD with Diagonal Pre-Change Dispersion Matrix Sg 


time Tq 
T o = inf 

m>l 


max sup > log 

l<r<m 





(15) 


where A and e > 0 are user-dehned parameters. The parameter 
A is a threshold used to control the false alarm rate. The 
parameter e represents the minimum magnitude of change, 
away from J = 1, that the user wishes to detect. 

The stopping rule Tq was shown to be asymptotically opti¬ 
mal in E) for a related QCD problem when 1) the marginal 
density fv{v',-) of the observation sequence {V{rri)} is of 
known form that is a member of a one-parameter exponential 
family, and 2) when the parameter Jq of the pre-change 
density is known. Both of these properties are satished for 
the summary statistic V = V (X) for the QCD model in ( [T4l l, 
since Jg = 1. Due to the results in p^ , the stopping rule Tq 
is asymptotically optimal for the problem (|^ as well. 

The following theorem establishes strong asymptotic opti¬ 
mality of this test with {V{m)} as the observation sequence. 
It also provides delay and false alarm estimates by which the 
test can be designed. 

Theorem 4.1 ( Fix any e > 0. 

1) For the stopping rule Tq, the supremum in ([^ is achieved 
at 7 = 1, i.e., 

sup E.y[To - 7 |tg > 7] = Ei[rG - 1 ]. 

7>1 

2) Setting A = log /3 ensures that as /3 —>■ (X), 

Eoo[to] > /3(1 -f 0(1)), 


and for each possible true post-change parameter J, with 

|J-l|>e, 


Ei[Tg] = y^(l + 0(l)) 

= inf sup E.y[T-7|T>7](l-fo(l)), 

(16) 


where /(J) is the Kullback-Leibler divergence between 
the densities /v(-; J) and fv{'\ !)■ 


Theorem 4.1 implies that the stopping rule Tg is uniformly 
asymptotically optimal for each post-change parameter J, as 
long as IJ — 11 > e. For convenience of implementation one 
can also use the window limited variation of Tq as suggested 

in irg. 


If the pre-change dispersion matrix is diagonal, then from 
Theorem 3.1 Jg = 1, and the QCD problem ( [T3| ) reduces 
to the parametric QCD problem with unknown post-change 
parameter J: 


V{m) ~ /y(-;l), 

~ fvi-] J), 


TO < 7 
J 7^ 1 , m > 7. 


(14) 


Consider the following QCD test, dehned by the stopping 


B. QCD with Block-Sparse Pre-Change Dispersion Matrix Sg 

As discussed above, if the pre-change dispersion matrix Sg 
is not diagonal, then the pre-change parameter Jg 7^ 1. If Sg 


is block-sparse with degree fc <C p, then by Theorem 3.1 
I Jg — 1| is small. This motivates the use of Lorden’s test as 
in for QCD, but then Theorem |4.1| no longer applies 


^The subscript G in tq is used to denote a GLR test. This is not to be 
confused with the use of density function g in the misspecification analysis 
to follow. 













7 


since fj^ with Jg = 1 is a mis-specification of the true pre¬ 
change distribution. In this section we analyze the performance 
of Lorden’s GLR test for {/,/} vs fi when the samples are 
drawn from a density g. The theorem proven below is in fact 
applicable to a broader class of scalar parameter exponential 
families, not just to the {fj} family ([T2) considered in this 
paper. 

For scalar parameter 6 let {/§} be a parametric exponential 
family of distributions with respect to a tr-hnite measure /r 

/e(2/) = 0e0, (17) 

where 0 is a specihed interval on the real line and b{9) is 
differentiable everywhere on 0. 

The QCD test Tq applied to this family for detecting a 
change from fg^ to fg, with |6* — 0o| > e, in an observation 
sequence {Ym} is given by (see 

• r/ 1 feiYi) 

Tq = ml < TO > 1 : max sup > log -—^—r > A) . 

\ i<k<m ^ fe,{Yi) j 

(18) 

Below we provide performance bounds for the mean time to 
false alarm and the average detection delay when the samples 
have density g. Specihcally, we provide bounds on Eg[rG], 
where Eg denotes expectation with respect to the probability 
measure under which all the samples {Ym} have density g. 
When the density g is close to fg^, the expression Eg[TG] can 
be interpreted as an estimate of the mean time to false alarm. 
When the density g is close to fg, for some 9 with \9 — 9q\ > e, 
then the expression Eg[rG] can be interpreted as an estimate 
of the average time to change detection. 

1) Mean Time to False Alarm: We hrst provide a lower 
bound on Eg[TG] when g is not necessarily equal to fg^, 
but is close to fg^ in a particular sense. The closeness is 
characterized through the following three assumptions. 

Assumption 1: There exists a positive constant Kg g such 
that for every 0 S 0 with \9 — 9q\> e 

Furthermore, there exists Kg such that 

0 < Kg < inf (Ke,g). (20) 

eee:|6(-eol>e 

The condition in is the classical condition needed 
to analyze one-sided tests under mis-specihcation p^ . The 
condition in ( |20| l is an additional condition that will be needed 
for analysis of the GLR test dehning the stopping time ( fTSl l. 

Let ^ be a family of densities on the real line, for example, 

S C {fg}. 

Assumption 2: There exists a positive constant k* such that 

0 < K* < Kg, yg G g. (21) 

Assumption 3: The KL-divergence I{9) between fg and fg^ 
increases with \9 — 9q\. 


1) If Assumption [T] and Assumption are satished then 

pKg A 

Eg[rG] > ^^- 

+ 1 


min{/(eo-|-e),7(6>o-£)} 

2) Furthermore, if Assumption is also satished then for 
every g G G 


oK* A 




+ 1 


min{/(eo-|-e),/(6>o —e)} 

Proof: The proof is provided in the appendix. 


We note that the lower bound in the second part of the above 
theorem is not a function of the density g. Also, if p = fg^ 
then K* = Kg = Kg^g = 1, V0, and the lower bounds agree 
with that obtained in GD- 


2) Average Detection Delay: We next obtain an upper 
bound on Eg[TG] when g is close to one of the members 
of the post-change set of densities. The closeness here is 
characterized by the following assumption. 

Assumption 4: 30g, s.t. \9g — 9 q\ > e 

J ^og[fg^{y)/fgg{y)] g{y) dp{y) > 0. 


Theorem 4.3: If Assumption is satished then 

EgGo] < -J-.— , ,,, t t -.J I N (l+o(l)) as A oo. 
/ log[/eg {y)lfeo {y)]g{y)dp.{y) 

Proof: The proof is provided in the appendix. ■ 

We note that if g = fg, for \9 — 9q\ > e, then the upper bound 
in Theorem 14. 3 1 is the performance of the GLR test as obtained 
in ini. 

3) Example from a Gaussian family: We now give an 
example where the conditions of Theorem 4.2 are satished. 
We state the result as a lemma. 

Lemma 1: Consider a Gaussian family 

1 (v-ef 


fe{x) = 




e 2 ^0 g (—oo, oo), 


where we are trying to detect a change in mean from a level 
9q to a level 9, with \9 — 9f\ > e, and the actual samples {Ym} 
have density g = f§^. Then the following hold. 

1) There exists a Kg g > 0 given by 


Kg,g — 1 -f 


2(00 — 9o) 


0-00 

satisfying provided | 0 o — 0 o| < e/ 2 . 

2) There exists a Kg > 0 given by 


( 22 ) 


= min{Kg,go+£, Kg_gg_J = 1 - 


2 (| 0 o- 0 o|) 


( 23 ) 


that satishes 

3) Let 

G = {f0- |0-0o| <e/3}. 


Then 




Theorem 4.2: 

















satisfies Assumption That is, k* is the minimum of 
kg with g = /so+j and kg with g = 

Proof: For the Gaussian case the integral 


/ 


fe{y) 
feo (y) 


f^9,g 

feAy) dy = l. 


can be explicitly solved for Kg^g giving two solutions: Kg^g = 0 
and that given by The latter is positive only if | 6 >o < 

e/2. This proves the first part of the theorem. 

The second part is true because Kg^g is monotonic in 6, and 
its value is smallest when 9 is either equal to 6 >o + e or 9o — e. 
This value has the explicit expression given by the right most 
expression in ( |23| l. 

The third part of the theorem is true because the expression 
for Kg given in (| 2 ^ is monotonic in | 0 o — 6 >o|- ■ 


4) Application to the family {fj}: The conditions of The- 


4.2 are also satished by the family {/j} dehned in ([T^ 


for large p. 

We consider a change in parameter J of the family 


fv{p;J) = ^(1 - Vexp T{p)] , P e ( 0 , 1 ], 


(24) 

from 1 to J with | J — 11 > e. If the samples are drawn from 
Jo, i.e., g — fjg, we show via numerical computation that 
similar to the Gaussian case, the worst case Kg is achieved at 
the boundary. To show this we fix e = 0.9, Jq = 1.1 and plot 
the integral in ([T^ for various values of J, | J — 1| > e, in 
Fig. Specihcally, we use J = 0.4,1.9, 5,10,15, and plot of 
the integral J{fj{v)/fi{v)Afjg{v) Ju = 1 as a function of 
K. In the figure we can see the points at which the integral 
equals 1 (identified for example by the labels Ki.g and K 5 ), and 
the smallest such point correspond to the curve for parameter 
J = 1 + e = 1 + 0.9 = 1.9. By varying Jq and e one can show 
that there is an interval around 1 for which the corresponding 
Kg^g is positive, and the smallest value is achieved by either 
of the post-change parameters 1 + e or 1 — e. Thus, the results 
are analogous to the Gaussian case analyzed in Lemma [T] 



Fig. 2. Plot of the integral / (/j (f)//! ('u)) /jq (t>) dt; = 1 as a function of 
K, for various values of J: J = 0.4,1.9, 5,10,15. The dashed lines are used 
to show the point of intersection of the curve corresponding to a particular 
value of J with the straight line at height 1. Note that the value of k at which 
the curves take value I, Kg, increases with the parameter value J for J > 1, 
and Kg < 1 for J > 1 and Kg > I for J < 1. Thus the smallest k is 
achieved by the parameter J = 1 -f e, which in this case, is 1 -f e = 1.9. 


V. Numerical Results 


Here we apply the stopping rule Tq in ( [TS] ) to the problem 
of detecting a change in the distribution when the {X(m)} 
are Gaussian distributed random matrices. In this case the 
dispersion S is the covariance matrix of the rows of X. 
The pre-change covariance is the p x p diagonal matrix 
So = diag{af), where ct/ > 0 are arbitrary component¬ 
wise variances. The post-change covariance matrix Si is a 
row-sparse matrix of degree k, obtained as follows. A p x p 
sample from the Wishart distribution is generated and some 
of the entries are forced to be zero in such a way that no row 
has more than k non-zero elements. Specihcally, we retain 
the top left k X k block of the matrix, and for each row 
i, k + 1 < i < (p + A:)/ 2 , all but the diagonal and the 
(p + fc + 1 — i)th element is forced to zero. Each time an 
entry (i, j) is set to zero, the entry (j, f) is also set to zero, to 
maintain symmetry. Finally, a scaled diagonal matrix is also 
added to Si to restore its positive dehniteness. We set n = 10, 
p = 100, and k = 5. 

To implement Tq we have chosen e = 1.5, and we use the 
the maximum likelihood estimator which, as a function of m 
samples (L(l), • • • ,V{m)) from fv{-, J), is given by 


J(l/(1),--- ,L(m)) 


^iET=lT{V{^)y 


(25) 


Specihcally, 


arg max 

J:J>2.5 


logJ2 




.fviV{A;J) 


= max{2.5, J{V{£),■■ ■ ,V (m))}. 


(26) 


In Fig. 1^ we plot the delay (Ei[ro]) vs the log of mean 
time to false alarm (log [+]) for various values of the post¬ 
change parameter J. The values in the hgure are obtained by 
choosing different values of the threshold A and estimating the 
delay by choosing the change point 7 = 1 and simulating the 
test for 500 sample paths. The mean time to false alarm values 
are estimated by simulating the test for 1500 sample paths. 
The parameter J for the post-change distribution is estimated 
using the maximum likelihood estimator ( |25] l. As predicted 
by the theory, the delay vs log of false alarm trade-off curve 
is approximately linear. For larger values of J, the Kullback- 
Leibler (K-L) divergence between /y(-, J) and /y(-, 1) is also 
larger, resulting in smaller delays. For the chosen values of 
the post-change parameters J = 1.99, 3.5, 5.97 and 21.45, the 
corresponding K-L divergence values J( J) are 0.1906, 0.5385, 
0.9543, and 2.1123, respectively. 

In Fig. 1^ we compare the delay vs false alarm trade-off 
curve for the post-change parameter J = 3.5 plotted in 
Fig. with the values predicted by the theory: ■ 

We see from Fig. I^that the predictions are quite accurate. We 
have obtained similar results when the test was simulated for 
different sparsity degrees k. Thus, the change can be efficiently 
detected using our proposed methodology. 

Finally, in Fig. is plotted the delay vs false alarm trade¬ 
off curves for a misspecihcation scenario. We consider the 
situation when we wish to detect a change in correlation in 
the sequence of random matrices {X(m)}. Suppose we believe 
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Fig. 3. The empirical mean time to detect vs mean time to false alai'm (in 
log scale). The mean time to detect decreases as the parameter J increases, 
and the relation between Ei[tg] and log(Eoo[TG]) is linear, as predicted by 
Theorem 4.1. The K-L divergence values for J = 1.99, 3.5, 5.97 and 21.45 
are 0.1906, 0.5385, 0.9543, and 2.1123, respectively. The slopes of the lines 
are approximately inverse of the K-L divergence values. 



Fig. 4. ^mparison of the delay vs false alarm trade-off curve for J = 3.5 
from Figj^with the values predicted by the theory: ' 

The difference diminishes as the mean time to false alarm increases. This is 
due to the asymptotic nature of the results in Theorem |4.1| 


that the pre-change dispersion matrix Sq ^ be diagonal and 
the post-change dispersion matrix Si to be row sparse. As 
a result we choose to apply the Lorden’s GLR rule Tg 
with e = 1.5, for change detection. The hrst curve from the 
bottom in Fig. is the performance, obtained via simulations, 
of the rule Tq when Sg is indeed diagonal and the post-change 
parameter is J = 3.149. This curve will serve as a benchmark. 

Suppose now that in fact the matrix Sg is not diagonal, 
and hence some correlation is present between the variables 
before the change occurs. Specihcally, we assume that Sg is 
block sparse with block size 5. For the Sg we have chosen the 
parameter Jg is 1.31, and the performance of the stopping rule 
To for this case is shown by the second curve from below in 
Fig. 0 As expected there is a loss in performance because of 
the misspecihcation in the pre-change parameter Jg. For this 
plot the threshold A for Tg was chosen using the knowledge 
of the pre-change parameter Jg. 

The remaining, the top two, plots in Fig. show the loss 
in performance as predicted by Theorem |4.2| As suggested by 


the theorem the delay-false alarm trade-off is given by 

Ei[7-g] = (1 + o(l)), as /3 -)> oo, 

when the pre-change parameter is known, and by 

Ei[tc] = + o(l))> as /3 oo, 

when the pre-change parameter is known only within an uncer¬ 
tainty class. The top most plot in Fig. is the trade-off curve 
for the case when we only know that | Jg — 1| < 0.4. Thus, 
K* is obtained by solving ^{fj{v)/fi{v)Yf,j„{v) dv = I 
for J = 2.5 and Jg = 1.4. The value obtained is 0.33. The 
second curve from the top is the trade-off curve when the 
value of K is obtained by using the knowledge of the pre¬ 
change parameter. The value of k = Kg so obtained is 0.47. As 
expected, these estimate show a signihcant loss in performance 
because the thresholds have to be chosen to satisfy the false 
alarm constraint, without the knowledge of the pre- and post¬ 
change parameters. 


Post-change J=3.149 



Fig. 5. Delay-false alarm trade-off curves for a misspecification scenario: 
GLR designed for J = 1 to J = 2.5, the same values that were used 
in Fig.[^ The actual pre-change parameter Jq = 1-31 because Sq is block 
sparse with block size 5. The post-change parameter is 3.149. The uncertainty 
family for the parameter Jq is ^ = {p = /jq : | Jq ~ 1| ^ 0.4}. From the 
top, the first curve is the trade-off when the pre-change parameter is not know, 
so that the threshold A has to be selected using Theorem |4.2K 2). The second 
curve is the trade-off curve obtained using the knowledge of the pre-change 
parameter. The post-change parameter is always unknown. The third curve is 
the actual performance of the rule tq when Jo = 1.31. The fourth curve, and 
the first from the bottom, is the performance of tq when Jq = 1. 

VI. Conclusions and Future Work 
We have introduced a novel summary statistic based on 
correlation mining and hub discovery for performing non- 
parametric quickest change detection (QCD) on a sequence 
of large scale random matrices. The proposed QCD algorithm 
is strongly optimal in the sense of Lorden | fTT| and Poliak | [T7| 
among all detection algorithms that use the proposed summary 
statistic. Future work will include extensions to local summary 
statistics and experiments with QCD in real applications that 
yield sequences of large scale random matrix measurements. 


Appendix 

Proof of Theorem |4.2[ As shown in IIII> the key to 
analysis of Tq is the following one-sided GLR test 


N = inf 


TO > 1 


sup 

e:|e-eol>e 


y}iog 

2 = 1 


fe{Yi) 

ho{Yi) 



(27) 
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specifically, for any density g, 

Pg{N <oo)<a Eg[Tc] > (28) 

where is the probability measure under which all the 
observations {Ym} have density g, and Eg is the corresponding 
expectation. We thus focus on obtaining a bound on Pg{N < 
oo). 

In reference to this we define the one-sided test between 9 
vs 00 as 


= inf I m > 1 : ^ log ^ 


(29) 


From Theorem 3.4 in | |23l we know that if there exists a 
positive constant Kg g > 0 such that (see ( [T9] l) 

/(Hi) w 

then 

Pg(i^(fs, foo) < oo) < (31) 

The basic idea behind ( |3T] l is that if ( |30ll is true, then we can 
define a density and the test 

’^{fe,feo) is equivalent to v{gKe,g,9) with threshold Ke,gA. 
The estimate pT] ) is then just the classical estimate of the 
probability for a one-sided test to stop in finite time under 
null hypothesis, obtained by applying Theorem 1.1 in (23 1 
(also see Theorem 3.1 in |[23|). 

We will now use to obtain an upper bound on Pg{N < 
00 ). Towards that end, we revisit Section 3 of | [TT] and modify 
the proof there appropriately to suit out needs. Note that 

fe{y) 


logffY = i9-9,)Tiy)-bi9) 
feo (y) 


6 ( 0 o). 


Thus, with 


we have 


Sr, 


2=1 


sup y/ log 

9-.\9—9Q\>e 


fe{Yr) 

K{Yr) 


= sup {9 - 9o)Sm-m{h{e)-b{9o)). 

e-.\e-So\>e 


Now, 


sup {9 - 9Q)Sm-m{b{9) - b{9Q)) > A 
0-.\0-eo\>e ^ 

[ 0:0>0o+e 9 - 9o 

uis„< sup 

0-.0<0o-e 9 — Oo J 


(32) 


This is because if the left hand side is true, then there is 9i 
such that {(9i — 9o)Sm — m(b(9i) — b(9o)) > A}, and 9i could 
be either greater or less than 9q, making 9i — 9q positive or 
negative. Thus, left hand side is a subset of the right hand 
side. An identical argument given in reverse justifies that the 
right hand side is a subset of the left. 


By Assumption there exists a positive constant Kg^g 
satisfying for every 9 G Q with |0 —0o| ^ £■ Furthermore, 


there exists Hg such that 


0 < Kg < inf {Kg g). 

6)Ge:|e-eol>e 


(33) 


With this assumption we have an upper bound on the estimate 
in ( [3T] ): V0 G 0, |0 — 9o\ > e. 


Pg{’^if0j0o) < oo) < e 


-Kg gA 


< e 


-KgA 


(34) 


Sm ^ 

= lim Pa 


Sm ^ 


(35) 


Now consider the infimum on the right hand side of 
Let the infimum be approached along the sequence {9i}. Then, 

A + m{b{9) -b{9o)) \ 
e-.e>0o+e 0-00 / 

A + m{b{9e) - b{9o)) \ 

>oo " ( 9i — 00 J 

< lim sup Pg {vU0i,f0o) < 

^—>•00 

< lim sup Pg (^(/e,, /eo) < oo) 

■^—>•00 

where the last inequality follows from ( |34| . An almost identi¬ 
cal argument yields the same bound on the probability of the 
other event involving a supremum in ( [3^ . Thus, 

Pg(iV = m) 


<P. 


< 2e 


sup (0 - 9o)Sm - m{b{9) - b{9o)) > A 

0:l0-0ol>e J 

— KgA 


(36) 


By Assumption F(0), the Kullback-Leibler divergence 
between fg and fg^, increases with |0 — 0o|. Because of this 
assumption, if 


< min{/( 0 o -f e),/( 0 o - e)}, 


(37) 


m 

then the infimum and supremum on the right hand side of ( 
are achieved at the boundaries 0o -I- e, and 0o — e, respectively. 
To see this, we differentiate to show that the term inside the 
infimum is equal to 

d A-\- m{b{9) — b(9o)) 


d9 


0-00 

m[(0-0o)6'(0)-(6(0)-6(0o))]-4l 


(0-0o)' 


(38) 


ml (9) — A 
{O-Oo)^ ■ 


Thus, setting the derivative to zero shows that the local interior 
minima 9* must satisfy 


I{9*)=A/m. 


(39) 


Since, 0 is assumed to be an interval and the term inside 
the infimum is continuous, it must achieve its minimum 
on [00 + e, 0 M], where 9m is the rightmost point of 0 . 
The condition (|J7]) guarantees that the minimum cannot be 
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achieved on ( 6*0 + e, Om)- Furthermore, it cannot be achieved 
at 0M since otherwise we would have the contradiction 


Proof of Theorem 4.3 
Then note that 


Let 6g be as in Assumption 


— < I{0q + e) < I{0m) < —, 
m m 

where the last inequality follows from the standard necessary 
condition on optimization over convex sets; see Proposition 
2.1.2 in p4) . Almost identical arguments allows us to prove 
that the supremum on the right hand side of ( |3^ is achieved 
at 00 - e if ([^ is true. 

Dehne 

^ ^ min{/( 0 o + e),F( 6 'o - e)} 

We have the estimate 
A 


® \min{/( 0 o + e),/( 0 o - e)} 

= Pff = to}) 


< N < 00 


<Pg i M i sup (0 - 0o)-5'm - to( 5(0) - 6(0o)) > ^ i i 

U 

Km^M ^ 


A + m{b{9) - 0(0o)) 

O^o 


M S S'm < sup 


A + to( 6 ( 0 ) - 6 ( 0 o)) } 
9 — 9q 


I 


=pJ U 


mGA4 


A + m{b{9o + e) - 0(0o)) } 
00 + e — 00 j 


<Ps I U MfSo+eJoo) < W} I 
ImGAI J 

+ Pg| U i'^ifeo-e, feo) < 

ImGAI J 

<Pg (Hfeo+eJeo) < 00 ) + Pg{iy{feg-e,f9o) < 00 ) 


< 2 e 


— k„A 


(40) 

Thus, similar to the estimate in [TTI , we have the estimate 

Pg{N < 00 ) 


L min{/(eiQ+e),/(eQ-e)} J 


E 

m—l 




Pg{N = to ) 

A 

® ^ min{/( 0 o + e),/( 0 o - e)} 
A 


From 


Pg[Ai] > 


min{/( 0 o + e),/( 0 o - e)} 

1 


< N < 00 


(41) 


2 e 


— k„A 


^ mi 


min{/(go+«)T(So-e)} 


+ 1 


(42) 


This proves the first part of the theorem. The second part is 
now obvious. ■ 


To = inf 


TO > 1 


max 

l<fe<m 


sup 

d-.\9-eo\>e 


^log 

i—k 




> A 


< inf 



> 1 : ^ log 
2=1 


f9Ar^) 

fM) 



(43) 


Assumption [^implies that the drift of the random walk with 
increments log is positive when samples are drawn 

from g. The theorem now follows from Proposition 8.21 in 
| |25l : as A —>• 00 


Pg[Ai] ^ 



( 1 + 0 ( 1 )). 


(44) 
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